Inducing Domain-Specific Sentiment Lexicons from Unlabeled Corpora
نویسندگان
چکیده
A word's sentiment depends on the domain in which it is used. Computational social science research thus requires sentiment lexicons that are specific to the domains being studied. We combine domain-specific word embeddings with a label propagation framework to induce accurate domain-specific sentiment lexicons using small sets of seed words. We show that our approach achieves state-of-the-art performance on inducing sentiment lexicons from domain-specific corpora and that our purely corpus-based approach outperforms methods that rely on hand-curated resources (e.g., WordNet). Using our framework, we induce and release historical sentiment lexicons for 150 years of English and community-specific sentiment lexicons for 250 online communities from the social media forum Reddit. The historical lexicons we induce show that more than 5% of sentiment-bearing (non-neutral) English words completely switched polarity during the last 150 years, and the community-specific lexicons highlight how sentiment varies drastically between different communities.
منابع مشابه
Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams
We study subjective language in social media and create Twitter-specific lexicons via bootstrapping sentiment-bearing terms from multilingual Twitter streams. Starting with a domain-independent, highprecision sentiment lexicon and a large pool of unlabeled data, we bootstrap Twitter-specific sentiment lexicons, using a small amount of labeled data to guide the process. Our experiments on Englis...
متن کاملSentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams and Exploiting Gender Language Differences on Twitter
We study subjective language in social media and create Twitter-specific lexicons via bootstrapping sentiment-bearing terms from multilingual Twitter streams. Starting with a domain-independent, highprecision sentiment lexicon and a large pool of unlabeled data, we bootstrap Twitter-specific sentiment lexicons, using a small amount of labeled data to guide the process. Our experiments on Englis...
متن کاملTwo-Step Model for Sentiment Lexicon Extraction from Twitter Streams
In this study we explore a novel technique for creation of polarity lexicons from the Twitter streams in Russian and English. With this aim we make preliminary filtering of subjective tweets using general domain-independent lexicons in each language. Then the subjective tweets are used for extraction of domain-specific sentiment words. Relying on co-occurrence statistics of extracted words in a...
متن کاملInducing Lexicons of Formality from Corpora
The spectrum of formality, in particular lexical formality, has been relatively unexplored compared to related work in sentiment lexicon induction (Turney and Littman, 2003). In this paper, we test in some detail several corpus-based methods for deriving real-valued formality lexicons, and evaluate our lexicons using relative formality judgments between word pairs. The results of our evaluation...
متن کاملBuilding Affective Lexicons from Specific Corpora for Automatic Sentiment Analysis
Automatic sentiment analysis in texts has attracted considerable attention in recent years. Most of the approaches developed to classify texts or sentences as positive or negative rest on a very specific kind of language resource: emotional lexicons. To build these resources, several automatic techniques have been proposed. Some of them are based on dictionaries while others use corpora. One of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing
دوره 2016 شماره
صفحات -
تاریخ انتشار 2016